(12) INTERNATIONAL APPLICATION PllBLISHED IINDER THE PATENT COOPERATION TREATX |F( T> 


( 1 9) World Intellectual Property Organization 
Intcrnaiioiial Burc^m 

(43| International Publication Date 
13 December 2001 ( 13.12.2001 » 



PCX 


illilliililllllilll 

(10) International Publication Number 

wo 01/95633 A2 


(51) liiteriiational Patent Clussification': H04N 7/26 

(21) International Application Number: PC lAJSOI/aOKI I 

(22) International Filing Date: 25 May 2(K)I (25.05.2001 ) 

(25) Filing Language: Hnulish 

(26) Publication Language: Hn^lish 


(30) Priority Data: 

()^)/590,92H 


9 June 2(WM)(()9.06.200(» US 


(71) Applicant {/or ail dcsi^naicd States except L'Si: GEN- 
EILAL INSTRUMENT C0RPOR.ATION jUSAISi: 101 
Toumameni Drive, Horsham. PA I9(W4 iVS). 

(72) Inventors; and 

(75) Inventors/Applicants (for US onhn PANIISOPONE. 

krit (TH/USI; 9fi56 CarmI! Canyon Road #1-4, San Diciio. 
CA 92120 (US). CHEN.Xueniin (USA IS]: 8560 Foxcrol't 
Place, San Die^o. CA 92129 < US). 

(74) Agent: LIPSITZ, Barry, R,: Law Oltlccs of Barry R. Lip- 
sii?„ Building No. 8. 755 Main Slreei. Monroe. CT 0fi468 
(US). 


(81) Designated States niaiionair. A\L A(i. Al.. AM. AT. AW. 
A/. BA, BB. BCk BR. BY. H/. ( A. Cll. CN. ( (). CR. CI t. 
C7. D\i. DK. DM. DA. VM. I-S. I I. CiH. G\X Cili. (H I. CJM. 
HR. mi 10. II.. IN. IS. JP. KM. KG. KP. KR. K/, I.C. l.K. 
LR. LS- IT. Ml. I.V. MA. MO. MCi. MK. MN. MW. M\. 
MZ- NO. N/.. PL. ri'. RO, RU. SD. Sli. S(i. SI. SK, SI.. 
TJ. I M. TR. rt, r/. UA. liC5. US. V'/.. VN. YU. /A. '/W. 

(W) Designated States trcffioua/n ARIW) paieni Kill. CJM. 
KK, LS. MW, MZ, SO. SL, S/. IZ. IKi, ZWl. liura.sian 
paicm i AM. AZ. BY. KG, KZ. MD. R\ I TJ. TM ). I:uropcaii 
palenl ( AT BtZ. CH. CY. DH. DK. LS. I L l"R. CiB, CiR. Ui. 
IT. LLI. MC. NL. PT Sli. TR), OA PI paieni (Bl- BJ. CI- 
CG. CI. CM. GA. GN. GW, ML. MR. Nli, SN. TO. TCi). 

Published: 

without international scaiv/i report and lo be republisheJ 
upon receipt oj that report 

for rwo-lctter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appear itjf* at the begin- 
ning of each regular issue of the PCT Gazette. 


(54) Title: VIDIiOSlZTi CONVERSION AND TRANSCODING FROM MPnG-2 TO Ml>nG.4 


- 300 


HEADER 

MPEG-2 

LOOK-UP 


HEADER 

T>«L£ 


MPEC-4 


HEADER 


310 


320 X 


330v 


UPEG-2 


VARIABU 
LfNCTH 

oecooiNG 


lf*«RSC 


IMVERSE 

GrrSTRCAU 


SCAN 


OUAWnZATWN 


'312 


340 V 


350x 


OUANnZAHOM 


AC/DC 



PREDICTION 



SCAN/ 

run-le>h:th 
coomc 


VAR1ABL£ 
ENCCXXNC 



^ MPeG-4 

BnSTREAM 


o 


(57) Abstract: A iranscoder architecture that provides ihc lowest possible eomplexiiy with a small error, e.^j.. for converting an 
MPHG-2 bilstrcam into an MI*EG-4 bitstream. The transcodcr reads header information (3(M) Irom an input biistream and provides 
a corresponding header in the new format for the output bitstream. In one embodiment (Tig. 3), a low complexity frt)nt-io-back 
transcoder (with B frjmes disabled) avoids the need for motion compensation pa>cessing. In antither embodiment (Fig. 4). a 
transcoder architecture that minimizes drill error (with B frames enabled) is provided. In another embodiment (Fig. 5), a size 
transcodcr (with B frames enabled) is provided, e.g., to convert a bitstream of ITU-R 601 interlaced video coding with MPEG-2 
MP@ ML into a simple proljle MPliG-4 bitstream which contains SIF progressive video suitable for a streaming video application. 
For spatial downscaling of field-mode DCT blocks, vertical and horizontal down.scalling tchniques are combined to use spanie ma- 
trixes to reduce computations. 
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VIDEO SIZE CONVERSION AND TRANSCODING FROM MPEG- 2 TO 

MPEG-4 

BACKGROUND OF THE INVENTION 

The present invention relates to compression of 
multimedia data and, in particular, to a video 
transcoder that allows a generic MPEG-4 decoder to 
decode MPEG-2 bitstreams. Temporal and spatial size 
conversion (downscaling) are also provided. 

The following acronyms and terms are used: 

CBP - Coded Block Pattern 

DCT - Discrete Cosine Transform 

DTV - Digital Television 

DVD - Digital Video Disc 

HDTV - High Definition Television 

PLC - Fixed Length Coding 

IP - Internet Protocol 

MB - Macroblock 

ME - Motion Estimation 

ML - Main Level 

MP - Main Profile 

MPS - MPEG-2 Program Stream 

MTS - MPEG-2 Transport Stream 

MV - Motion Vector 

QP - quantization parameter 

PMV - Prediction Motion Vector 

RTP - Real-Time Transport Protocol (RFC 1889) 

SDTV - Standard Definition Television 
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SIF - Standard Intermediate Format 
SVCD - Super Video Compact Disc 
VIjC - Variable Length Coding 
VLD - Variable Length Decoding 
VOP - Video Object Plane 

MPEG-4, the multimedia coding standard, provides a 
rich functionality to support various applications, 
including Internet applications such as streaming 
media, advertising, interactive gaming, virtual 
traveling, etc. Streaming video over the Internet 
(multicast), which is expected to be among the most 
popular application for the Internet, is also well- 
suited for use with the MPEG-4 visual standard (ISO/IEC 
14496-2 Final Draft of International Standard (MPEG-4) , 
"Information Technology - Generic coding of audio- 
visual objects. Part 2: visual," Dec. 1998). 

MPEG-4 visual handles both synthetic and natural 
video, and accommodates several visual object types, 
such as video, face, and mesh objects. MPEG-4 visual 
also allows coding of an arbitrarily shaped object so 
that multiple objects can be shown or manipulated in a 
scene as desired by a user. Moreover, MPEG-4 visual is 
very flexible in terms of coding and display 
configurations by including enhanced features such as 
multiple a\ixiliary (alpha) planes, variable frame rate, 
and geometrical transformations (sprites) . 

However, the majority of the video material (e.g., 
movies, sporting vents, concerts, and the like) which 
is expected to be the target of streaming video is 
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already compressed by the MPEG- 2 system and stored on 
storage media such as DVDs, computer memories (e.g., 
server hard disks), and the like. The MPEG-2 System 
specification (ISO/IEC 13818-2 International Standard 
(MPEG-2), "Information Technology - Generic coding of 
Moving Pictures and Associated Audio: Part 2 - Video," 
1995) defines two system stream formats: the MPEG-2 
Transport Stream (MTS) and the MPEG-2 Program Stream 
(MPS) . The MTS is tailored for communicating or 
storing one or more programs of MPEG-2 compressed data 
and also other data in relatively error-prone 
environments. One typical application of MTS is DTV. 
The MPS is tailored for relatively error-free 
environments. The popular applications include DVD and 
SVCD. 

Attempts to address this issue have been 
unsatisfactory to date. For example, the MPEG-4 studio 
profile (O. Sunohara and Y. Yagasaki, "The draft of 
MPEG-4 Studio Profile Amendment Working Draft 2.0," 
ISO/IEC JTC1/SC29/WG11 MPEG99/5135, Oct. 1999) has 
proposed a MPEG-2 to MPEG-4 transcoder, but that 
process is not applicable to the other MPEG-4 version 1 
profiles, which include the Natural Visual profiles 
(Simple, Simple Scaleable, Core, Main, N-Bit) , 
Synthetic Visual profiles (Scaleable Texture, Simple 
Face Animation) , and Synthetic/Natural Hybrid Visual 
(Hybrid, Basic Animated Texture) . The studio profile 
is not applicable to the Main Profile of MPEG-4 version 
1 since it modifies the syntax, and the decoder process 
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is incompatible with the rest of the MPEG-4 version 1 
profiles. 

The MPEG standards designate several sets of 
constrained parameters using a two-dimensional ranking 
order. One of the dimensions, called the "profile" 
series, specifies the coding features supported. The 
other dimension, called "level", specifies the picture 
resolutions, bit rates, and so forth, that can be 
accommodated , 

For MPEG- 2, the Main Profile at Main Level, or 
MP®ML, supports a 4:2:0 color subsampling ratio, and I, 
P and B pictures. The Simple Profile is similar to the 
Main Profile but has no B-pictures. The Main Level is 
defined for ITU-R 601 video, while the Simple Level is 
defined for SIF video. 

Similarly, for MPEG-4, the Simple Profile contains 
SIF progressive video (and has no B-VOPs or interlaced 
video) . The Main Profile allows B-VOPs and interlaced 
video . 

Accordingly, it would be desirable to achieve 
interoperability among different types of end-systems 
by the use of MPEG-2 video to MPEG-4 video transcoding 
and/or MPEG-4 -video to MPEG- 2 -video transcoding. The 
different types of end-systems that should be 
accommodated include : 

Transmitting Interworking Unit (TIU) : Receives 
MPEG-2 video from a native MTS (or MPS) system and 
transcodes to MPEG-4 video and distributes over packet 
networks using a native RTP -based system layer (such as 


wo (M/956J3 


PCT/l sni/4<isn 


5 


an IP-based internetwork) . Examples include a real- 
time encoder, a MTS satellite link to Internet, and a 
video server with MPS-encoded source material. 

Receiving Interworking Unit (RIU) : Receives MPEG-4 
video in real time from an RTP-based network and then 
transcodes to MPEG-2 video (if possible) and forwards 
to a native MTS (or MPS) environment. Examples include 
an Internet -based video seirver to MTS-based cable 
distribution plant. 

Transmitting Internet End-System (TIES) : Transmits 
MPEG-2 or MPEG-4 video generated or stored within the 
Internet end-system itself, or received from internet- 
baged computer networks. Examples include a video 
server. 

Receiving Internet End-System (RIES) : Receives 
MPEG-2 or MPEG-4 video over an RTP-based internet for 
consumption at the Internet end- system or forwarding to 
a traditional computer network. Examples include a 
desktop PC or workstation viewing a training video. 

It would be desirable to determine similarities 
and differences between MPEG-2 and MPEG-4 systems, and 
provide transcoder architectures which yield a low 
complexity and small error. 

The transcoder architectures should be provided 
for systems where B-frames are enabled (e.g., main 
profile) , as well as a simplified architecture for when 
B-frames are not used (simple profile) . 

Format (MPEG-2 to MPEG-4) and/or size transcoding 
should be provided. 
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It would also be desirable to provide an efficient 
mapping from the MPEG-2 to MPEG-4 syntax, including a 
mapping of headers. 

The system should include size transcoding, 
5 including spatial and temporal transcoding. 

The system should allow size conversion at the 
input bitstream or output bitstream of a transcoder. 

The size transcoder should convert a bitstream of 
ITU-R 601 interlaced video coded with MPEG-2 MP^SML into 
10 a simple profile MPEG-4 bitstream which contains SIF 

progressive video suitable, e.g., for a streaming video 
application. 

The system should provide an output bitstream that 
can fit in the practical bandwidth for a streaming 
15 video application (e.g., less than 1 Mbps) - 

The present invention provides a system having the 
above and other advantages. 
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SUMMARY OF THE INVENTION 

The invention relates to format transcoding (MPEG- 
2 to MPEG-4) and size (spatial and temporal) 
transcoding . 

A proposed transcoder includes size conversion, 
although these parameters can be transcoded either at 
the input bitstream or the output bitstream. However, 
it is more efficient to include all kinds of 
transcoding into the product version of a transcoder to 
reduce the complexity since the transcoders share 
processing elements with each other (such as a 
bitstream reader) . 

The invention addresses the most important 
requirements for a transcoder, e.g., the complexity of 
the system and the loss generated by the process. 

In one embodiment, a proposed front- to-back 
transcoder architecture reduces complexity because 
there is no need to perform motion compensation. 

In a particular embodiment, the transcoder can use 
variable 5 -bit QP representation, and eliminates AC/DC 
prediction and the nonlinear DC scaler. 

The invention is alternatively useful for rate 
control and resizing. 

A particular method for transcoding a pre- 
compressed input bitstream that is provided in a first 
video coding format includes the steps of: recovering 
header information of the input bitstream; providing 
corresponding header information in a second, different 


\VO(M/9563J 


PCT/l)S01/4IIKn 


video coding format; partially decompressing the input 
bitstream to provide partially decompressed data; and 
re -compressing the partially decompressed data in 
accordance with the header information in the second 
5 format to provide the output bitstream. 

A method for performing 2:1 downscaling on video 
data includes the steps of: forming at least one input 
matrix of NxN (e.g., N=16) Discrete Cosine Transform 
(DCT) coefficients from the video data by combining 

10 four N/2XN/2 field-mode DCT blocks; performing vertical 

downsampling and de- interlacing to the input matrix to 
obtain two N/2xN/2 frame-mode DCT blocks; forming an 
NxN/2 input matrix from the two frame-mode DCT blocks; 
and performing horizontal downsampling to the NxN/2 

15 matrix to obtain one N/2xN/2 frame-mode DCT block. 

Preferably, the vertical and horizontal 
downsampling use respective sparse downsampling 
matrixes. In particular, a vertical downsampling 
matrix of 0.5 [Is Is! may be used, where la is an 8x8 

20 identity matrix. This is essentially vertical pixel 

averaging. A horizontal downsampling matrix composed 
of odd "0" and even "E" matrices may be used- 
Corresponding apparatuses are also presented. 
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BRIEF DESCRIPTION OP THE DRAWINGS 

PIG. 1 illustrates an MPEG-2 video decoder. 

FIG. 2 illustrates an MPEG-4 video decoder without 
any scalability feature. 

FIG. 3 illustrates a low complexity front- to-back 
transcoder (with B frames disabled) in accordance with 
the invention.. 

FIG. 4 illustrates a transcoder architecture that 
minimizes drift error (with B frames enabled) in 
accordance with the invention. 

PIG. 5 illustrates a size transcoder in accordance 
with the invention. 

PIG. 6 illustrates downsampling of four field mode 
DCT blocks to one frame mode DCT block in accordance 
with the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to format transcoding (MPEG- 
2 to MPEG-4) and size (spatial and temporal) 
transcoding . 

5 The invention provides bit rate transcoding to 

convert a pre -compressed bitstream into another 
compressed bitstream at a different bit rate. Bit rate 
transcoding is important, e.g., for streaming video 
applications because the network bandwidth is not 

10 constant and, sometimes, a video server needs to reduce 

the bit rate to cope with the network traffic demand. 
A cascaded-based transcoder which re -uses MVs from the 
input bitstream and, hence, eliminates motion 
estimation (ME) , is among the most efficient of the bit 

15 rate transcoders. The cascaded-based transcoder 

decodes the input bitstream to obtain the MV and form 
the reference frame. It then encodes this information 
with a rate control mechanism to generate an output 
bitstream at the desired bit rate. 

20 Spatial resolution transcoding becomes a big issue 

with the CO- existence of HDTV and SDTV in the near 
future. It is also very beneficial for the streaming 
video application since it is likely that the Internet 
bandwidth is not going to be large enough for broadcast 

25 quality video. Hence, downsampling of the broadcast 

quality bitstream into a bitstream with a manageable 
resolution is appealing. Spatial resolution 
transcoding usually performs in the compressed (DCT) 
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domain since it drastically reduces the complexity of 
the system. The process of downsampling in the 
compressed domain involves the processing of two 
parameters, namely DCT coefficients and MVs. A 
5 downsampling filter and its fast algorithm is suggested 
to perform DCT coefficient downsampling. MV resampling 
is used to find the MV of the downsampled video. In 
the real product, to avoid drift, the residual of the 
motion compensation should be re -transformed instead of 
10 approximating the DCT coefficients from the input 

bitstream. 

2. High level con^arison 

Structure-wise, MPEG-2 and MPEG-4 employ a similar 
video compression algorithm. Fundamentally, both 
15 standards adopt motion prediction to exploit temporal 

correlation and quantization in the DCT domain to use 
spatial correlation within a frame. This section 
describes the structure of the MPEG-2 and MPEG-4 
decoders at a high level, and then notes differences 
20 between the two standards. 
2.1 MPEG-2 

FIG. r shows the simplified video decoding process 
of MPEG-2. In the decoder 100, coded video data is 
provided to a variable length decoding function 110 to 
25 provide the one -dimensional data QFS [n] , where n is a 

coefficient index in the range of 0-63. At the inverse 
scan function 120, QFS [n] is converted into a two- 
dimensional array of coefficients denoted by QF [v] [u] , 
where the array indexes u and v both lie in the range 0 
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to 7. An inverse quantisation function 130 applies the 
appropriate inverse quantisation arithmetic to give the 
final reconstructed, frequency-domain DCT coefficients, 
F[v] [u] . An inverse DCT function 14 0 produces the 
5 pixel (spatial) domain values f [y] [x] . A motion 

compensation function 150 is responsive to a frame 
store memory 160 and the values f [y] [x] for producing 
the decoded pixels (pels) d [y] [x] , where y and x are 
Cartesian coordinates in the pixel domain. 

10 MPEG-2 operates on a macroblock level for motion 

compensation, a block level for the DCT transformation, 
and the coefficient level for run- length and lossless 
coding. Moreover, MPEG-2 allows three types of 
pictures, namely I-, P- and B- pictures. Allowed 

15 motion prediction modes (forward, backward, bi- 

directional) are specified for the P- and B- pictures. 
MPEG-2 uses interlaced coding tools to handle 
interlaced sources more efficiently. 
2.2 MPEG-4 

20 FIG. 2 shows the MPEG-4 video decoding process 

without any scalability features. 

At the decoder 200, data from a channel is output 
from a demux 210. A coded bit stream of shape data is 
provided to a switch 215, along with the MPEG-4 term 

25 video_object_layer_shape (which indicates, e.g., 

whether or not the current image is rectangular, binary 
only, or grayscale) , If video_object_layer_shape is 
equal to "00" then no binary shape decoding is 
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required. Otherwise, binary shape decoding is carried 
out . 

If binary shape decoding is performed, a shape 
decoding function 220 receives the previous 
reconstructed VOP 23 0 (which may be stored in a 
memory) , and provides a shape -decoded output to a 
motion compensation function 240. The motion 
compensation function 24 0 receives an output from a 
motion decoding function 235, which, in turn, receives 
a motion coded bit stream from the demux 210. The 
motion compensation function 24 0 also receives the 
previous reconstructed VOP 23 0 to provide an output to 
a VOP reconstruction function 245. 

The VOP reconstruction function 245 also receives 
data from a texture decoding function 250 which, in 
turn, receives a texture coded bit stream from the 
demux 210, in addition to an output from the shape 
decoding function 220. The texture decoding function 
250 includes a variable length decoding function 255, 
an inverse scan function 2 60, an inverse DC and AC 
prediction function 270, an inverse quantization 
function 280 and an Inverse DCT (IDCT) function 290. 

Compared to MPEG-2, several new tools are adopted 
in MPEG-4 to add features and interactivity, e.g., 
sprite coding, shape coding, still texture coding, 
scalability, and error resilience. Moreover, motion 
compensation and texture coding tools in MPEG-4, which 
are similar to MPEG-2 video coding, are modified to 
improve the coding efficiency, e.g., coding tools such 
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as direct mode motion compensation, unrestricted motion 
compensation, and advanced prediction. 

In particular, direct mode motion compensation is 
used for B-VOPs. Specifically, it uses direct bi- 
directional motion compensation derived by employing I- 
or P-VOP macroblock MVs and scaling them to derive 
forward and backward MVs for macroblocks in B-VOP. 
Only one delta MV is allowed per macroblock. The 
actual MV is calculated from the delta vector and the 
scaled MV from its co- located macroblock. 

Unrestricted motion compensation allows one or 
four MVS per macroblock. The four MV mode is only 
possible in B-VOPs with the use of direct mode. Note 
that the MV for a chrominance macroblock is the average 
of four MVS from its associated luminance macroblock. 
Furthermore, unrestricted motion compensation allows an 
MV to point out of the reference frame (the out -of - 
bound texture is padded from the edge pixel) . 

Advanced prediction defines the prediction method 
for MV and DCT coefficients. A MV predictor is set 
according to the median value of its three neighbors' 
MVs. Prediction of the intra DCT coefficient follows 
the intra AC/DC prediction procedure (Graham's rule). 

3. Transcoder architecture 

FIG. 3 illustrates a low complexity front- to-back 
transcoder in accordance with the invention, with B 
frames disabled. 

Similarities between the structures of MPEG- 2 and 
MPEG-4 allow a low complexity (front-to-back) 
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transcoder. Instead of completely decoding an MPEG-2 
bitstream to the spatial (pixel) domain level, the 
front-to-back transcoder 300 uses DCT coefficients and 
MVS to generate an MPEG-4 bitstream without actually 
performing a motion estimation process, a trade-off is 
that this architecture may cause a drift in the 
reconstructed frame, and does not allow bit rate 
control. However, the drift problem is small since 
most of the difference between the MPEG-2 and MPEG-4 
decoders lies in the lossless coding part. 

The transcoder 300 comprises a cascade of a MPEG-2 
bitstream reader (decoder) (310-330) and a MPEG-4 
header and texture coder (encoder) (340^370), along 
with a header decoding function 304, a look-up table 
308, and a communication path 312. The transcoder 3O0 
reads an input MPEG-2 bitstream. performs a variable 
length decoding (VLD) at a function 310 on DCT 
coefficients and MV residual, and then follows MPEG-2 
logic to find DCT coefficients and/or MVs of every 
20 block in the frame. 

The header decoding function 3 04 decodes the MEPG- 
2 headers and provides them to a look-up table (or 
analogous function) 308, which uses the tables detailed 
below to obtain corresponding MPEG-4 headers . 

With the information of the headers, DCT 
coefficients and/or MV, the transcoder 300 encodes this 
information into the MPEG-4 format. Note that the 
reference frame is not needed in this architecture 
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The transcoder 3 00 reads the MPEG-4 header from 
the input bitstream and writes the corresponding MPEG-4 
header in its place in an output bitstream. 

After processing at the VLD 310, the data is 
provided to an inverse scan function 320, and an 
inverse quantisation function 33 0. Next, using the 
MPEG-4 header information provided via the path 312, 
the decoded, DCT coefficient data is processed at a 
MPEG-4 header and texture coder that includes a 
quantisation function 340, and an AC/DC prediction 
function 350 for differentially encoding the quantised 
DCT coefficients. In particular, the AC/DC prediction 
process generates a residual of DC and AC DCT 
coefficients in an intra MB by subtracting the DC 
coefficient and either the first row or first column of 
the AC coefficients. The predictor is adaptively 
selected. Note that the AC/DC prediction function 350 
may not need the MPEG-4 header information. 

Subsequently, a scan/ run -length coding function 
3 60 and a variable length encoding function 3 70 provide 
the MPEG-4 bitstream. 

FIG. 4 illustrates a transcoder architecture that 
minimizes drift error in accordance with the invention, 
with B frames enabled. 

Like -numbered elements correspond to one another 
in the figures. 

To coimter the problems of drift in the 
reconstructed frame, and^ the lack of bit rate control, 
a more complex architecture such as the transcoder 400, 
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which is an extension of the transcoder 300 of FIG. 3, 
can be used. This architecture actually computes the 
DCT coefficient of the texture/residual data, hence 
motion compensation is required. Since the encoder of 
this transcoder includes a decoding process, the drift 
error can be minimized. 

Moreover, the transcoder 4 00 can be used to 
transcode bit streams with B- frames since MPEG-4 does 
not allow intra mode for B-frames. The transcoder 400 
treats a block in intra mode in a B-frame (in MPEG-2) 
as a block with a zero MV in inter mode (in MPEG-4) . It 
can be either a zero residual MV (PMV) or zero MV 
(which may yield a non-zero MV code) since the MV is 
predictive coded against the PMV. 

In particular, the transcoder 400 includes a 
variable length decoding function 405 that provides MV 
residue data to a MV decoder 425, and that provides DCT 
coefficient data to the inverse scan function 320. The 
DCT data is processed by the inverse quantisation 
fiinction 330 and an inverse DCT function 420 to obtain 
pixel domain data. Intra-coded pixel data is provided 
via a path 422 to a buffer, while inter-coded pixel 
data is provided to an adder 435 via a path 424. 

The pixel (difference) data on path 424 is added 
to reference pixel data from a motion compensation 
function 430 (responsive to the MV decoder 425) to 
provide inter-coded data to the buffer 450 via a path 
448. 
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For re-encoding, e.g., in the MPEG-4 format, the 
buffer 450 either outputs the intra pixel data directly 
to a DCT function 455, or outputs the inter pixel data 
to a subtracter 445, where a difference relative to an 
5 output from a motion compensation function 440 

(responsive to the MV decoder 425) is provided to the 
DCT function 455. 

The DCT coefficients are provided from the DCT 
function 455 to the quantisation function 340, and the 

10 quantised DCT data is then provided to the AC/DC (DCT 

coefficient) prediction function 350, where AC and DC 
residuals of the current MB are generated. These 
residuals of DCT coefficients are entropy coded. The 
output data is provided to the scan/ run -length coding 

15 function 360, and the output thereof is provided to the 

variable length encoding function 370 to obtain the 
MPEG-4 compliant bitstream. 

The quantised DCT coefficients are also output 
from the quantisation function 340 to an inverse 

20 quantisation f\inction 495, the output of which is 

provided to an inverse DCT function 490, the output of 
which is summed at an adder 485 with the output of the 
motion compensation function 440. The output of the 
adder 485 is provided to a buffer 480, and subsequently 

25 to the motion compensation function 440. 

The header decoding function 304 and look-up table 
308 and path 312 operate as discussed in connection 
with FIG. 3 to control the re-encoding to the MPEG-4 
format at functions 340-370. 
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4. Implementation of the Format Transcoder 

This section explains the implementation of the 
format transcoding, e.g., as implemented in FIGs 3 and 
4, discussed above, and FIG. 5, to be discussed later. 
Minor implementation details (e.g., systems-related 
details such as the use of time stamps and the like) 
that are not specifically discussed should be apparent 
to those skilled in the art. 

In a particular implementation, the transcoders of 
the present invention can be used to convert a main- 
profile, main- level (MP@ML) MPEG- 2 bitstream into a 
main-profile MPEG-4 bitstream. It is assumed that the 
MPEG-2 bitstream is coded in frame picture structure 
with B-picture coding (no dual prime prediction) . 
Generally, the same coding mode which is used in MPEG-2 
coding should be maintained. This mode is likely to be 
optimum in MPEG-4 and hence avoids the complexity of 
the mode decision process. The transparency pattern in 
MPEG-4 is always 1 (one rectangular object with the 
same size of VOP in one VOP) . That is, MPEG-4 allows 
an arbitrarily shaped object which is defined by a 
nonzero transparency pattern. This feature does not 
exist in MPEG-2 so we can safely assume that all 
transparency patterns of the transcoding object is one. 
4.1 MPEG-2 bitstream reader 

A transcoder in accordance with the invention 
obtains the bitstream header, DCT coefficients and MVs 
from the MPEG-2 bitstream. This information is mixed 
together in the bitstream. Both MPEG-2 and MPEG-4 
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bitstreams adopt a hierarchical structure consisting of 
several layers. Each layer starts with the header 
following by a multiple of its sublayer. In this 
implementation, as shown in Table 1, the MPEG-2 layer 
5 has a direct translation into the MPEG-4 layer, except 
the slice layer in MPEG-2, which is not used in MPEG-4. 
DC coefficients and predicted MVs in MPEG-4 are reset 
at the blocks that start the slice. 

However, some MPEG-4 headers are different from 
10 MPEG-2 headers, and vice versa. Fortunately, the 

restrictions in MPEG-2 and MPEG-2 header information 
are sufficient to specify a MPEG-4 header. Tables 2 
through 6 list MPEG-4 headers and their relation to a 
MPEG-2 header or restriction at each layer. 

15 Table 1. Relationship between MPEG-2 and MPEG-4 


layers 


MPEG.2 

MPEG-4 

Video Sequence 

Video Object Sequence (VOS) / 
Video Object (VO) 

Sequence Scalable Extension 

Video Object Layer (VOL) 

Group of Picture (GOP) 

Group of Video Object Plane (GOV) 

Picture 

Video Object Plane (VOP) 

Macroblock 

Macroblock 

Table 2. MPEG-4 header and its derivation (VOS and 

VO) 

Header 

Code 

Comment 

VisuaLobject_sequence„start_code 

00001 BO 

Initiate a visual session 

Proflle_and.leveljndication 

00110100 

Main Profile/Level 4 
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VisuaLobject_sequence_encl_code 

00001B1 

Terminate a visual session 

VisuaLobjecLstart.code 

00001 B5 

Initiate a visual object 

ls_visuaLobjectJdenlifier 

0 

No version identification of priority 
needs to be specified 

VisuaLobjectJype 

0001 

Video ID 

Video_object_start_code 

O0OOO10X- 
0000011X 

Mark a new video object 

Video_signaLtype 

Derived 

from 

MPEG-2 

Coaesponds to MPEG-2 
sequence_display_extension_id 

Video Jormat 

Same as 
MPEG-2 

Con-esponds to MPEG-2 
sequence_display„extensionJd 

Video_range 

Derived 

from 

MPEG-2 

Corresponds to MPEG-2 
sequence_display_extensionJd 

Colour_description 

Same as 
MPEG-2 

Conresponds to MPEG-2 
sequence_display_extensionJd 

Colour_primaries 

Same as 
MPEG-2 

Corresponds to MPEG-2 
colour_description 

Transfer.characteristics 

Same as 
MPEG.2 

Conresponds to MPEG-2 
colour_descriptfon 

Matrix.coefficients 

Same as 
MPEG-2 1 

Corresponds to MPEG-2 
colour_descriptbn 


e 3. MPEG-4 header and its derivation (VOL) 


Header 

1 Code 

Comment 

Video_objectjayecstart_code 

0000012X 

Mark a new video object layer 

Random_access(ble_vol 

0 

Allow non-intra coded VOP 

Video_object_typejdentification 

00000100 

Main object type 

ls_object_type_identifier 

0 

No version identification of priority 
needs to be specified 

Aspect.ratiojnfo 

Same as 
MPEG-2 

Corresponds to MPEG-2 
aspect,ratioJnformation 

Par_width 

Same as | 

Coaesponds to MPEG-2 
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MPEG-2 

verticaLsize 

Par_height 

Same as 

Corresponds to MPEG-2 


MPEG-2 

horizontaLsize 

VoLcontroLparameters 

Same as 

Correponds to MPEG-2 


MPEG-2 

extension_start_code_identifier 



{sequence extension) 

Chroma Jormat 

Same as 

Corresponds to MPEG-2 


MPEG-2 

chromajomat 

Low.delay 

Same as 

Corresponds to MPEG-2 low.delay 


MPEG-2 


\/bv_parameters 

Recomputed 

Follow MPEG-4VBV spec. 

Video_objectJayeL.shape 

00 

Rectangular 

VopJimeJncremenLresolution 

Recomputed 

See Table 7 

Fixed_vop_rate 

1 

Indicate that all VOPs are coded at 



a fixed rate 

Fixed_vopJime,increment 

Recomputed 

See Table 7 

Video.objectJayeLwidth 

Same as 

Correpond to display^vertlcaLsize 


MPEG-2 


Video_objectJayer_height 

Same as 

Correspond to 


MPEG-2 

display_horizontaLsize 

Interlaced 

Same as 

Correspond to 


MPEG-2 

progressive.sequence 

UDmc„disaDie 

1 

Disable OBMC 

bprite^enaDle 

0 

Indicate absence of sprite 

Not R hit 

uenveo irom 

Corresponds to MPEG-2 


MPEG-2 

intra_dc ..precision 

Quant.type 

1 

MPEG quantization 

Complexity_estimation_disabIe 

1 

Disable complexity estimation 



header 

Resync_marker_disable 

1 

Indicate absence of resync_marker 

Data_partrtioned 

0 

Disable data partitioning 

Reversible^vlc 

0 ^ 

Disable reversible vie 

Scalability 

0 

Indicate that the cun-ent layer is 

~ . 


used as base-layer 
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Table 4, MPEG-4 header and its derivation (VOP) 


Header 

Code 

Comment 

\/op_start_code 

000001 B6 

Mark a start of a video object plane 

\/op_coding_type 

Same as 

Conresponds to MPEG-2 


MPcG-2 

picture_coding_type 

Modulo Jime_base 

Regenerated 

Follow MPEG-4 spec. 

Vopjimejncrement 

Regenerated 

Follow MPEG-4 spec. 

\/op_coded 

1 

Indicate that subsequent data exists 
for the VOP 

Vop„rounding_type 

0 

Set value of rounding_control to '0' 

Change_conversion„ratio_disable 

1 

Assume that conv_ratio is 1 ' for all 
macrcblocks 

Vop.constanLalpha 

0 

Not include 

vop_constant„alpha_value in the 
Diistream 

lntra_dc_vlcjhr 

0 

Use intra DC vie for entire VOP 

Top_field_first 

Same as 

Corresponds to MPEG-2 


MPEG-2 

topjetdjrst 

Alternate_verticaLscanJag 

Same as 

Corresponds to MPEG-2 to 


MPEG-2 

alternate_scan 

Vop_quant 

Derived from 

Con-esponds to MPEG-2 


MPEG-2 

quantiser_scale_code 

Vopjcodejorward 

Same as 
MPEG-2 

See section 4.3 

VopJcode_backward 

Same as 
MPEG-2 

See section 4.3 


Table 5. MPEG-4 header and its derivation 
(macroblock and MV) 


Header 

Code 

Comment 

Not.coded 

Derived from MPEG-2 

Corresponds to MPEG-2 
macroblock_addressJncrement 
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Mcbpc 

Derived from MPEG-2 

Corresponds to MPEG-2 
macroblockjype 

Ac_pred_flag 

0 

Disable intra AC prediction 

CbpY 

Derived from MPEG-2 

See section 4.2 

Dquant 

Derived from MPEG-2 

See section 4.2 

Modb 

Derived from MPEG-2 

Corresponds to macroblockjype 

Mbjype 

Derived from MPEG-2 

Corresponds to macroblockjype 

Cbpb 

Derived from MPEG-2 

See section 4.2 

Dbquant 

Derived from MPEG-2 

See section 4.2 

HorizontaLmv_data 

Derived from MPEG-2 

Coaesponds to MPEG-2 
motion_code{r][s][0] 

VerticaLmv„data 

Derived from MPEG-2 

Corresponds to MPEG-2 
motion_code(r][s][1] 

HorizontaLmv_fes(duat 

Derived from MPEG-2 

Corresponds to MPEG-2 
motion jesidual[r][s][01 

VerticaLmv.residual 

Derived from MPEG-2 

Corresponds to MPEG-2 
motion jesidual[r][s][1 ] 


Table 6. MPEG-4 header and its derivation (block 
and interlaced information) 


Header 

Code 

Comment 

Dct_dc_sizejuminance 

Same as MPEG-2 

Corresponds to MPEG-2 
dct_dc_size_luminance 

Dct_dc_differential 

Same as MPEG-2 

Correspond to dct.dc^differential 

Dct„dc_size_chrominance 

Same as MPEG-2 

Corresponds to MPEG-2 
dct_dc_si2e_chromlnance 

DCT_coefficient 

Derived from 
MPEG.2 

See section 4-2 

DCTjype 

Same as MPEG-2 

Corresponds to MPEG-2 
DCT.type 

Field_prediction 

Same as MPEG-2 

Corresponds to MPEG-2 
frame^motionjype 

Forward Jop_fieldjeference 

Same as MPEG-2 

Corresponds to MPEG-2 
motion_verticalJeld.selectfO][0] 
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Forward„bottomjield_reference 

Same as MPEG-2 

Corresponds to MPEG-2 
motion_verticaLf(eld_select[1 ][0] 

Backward Jopjeld^reference 

Same as MPEG-2 

Corresponds to MPEG-2 
motion_verticaljeld_select[0][1] 

Backward_bottomJeld_reference 

Same as MPEG-2 

Corresponds to MPEG-2 
motion_verticaLfield„select[1l[1] 


Table 7, Mapping of f rame__rate_code in MPEG-2 to 
vop_t ime_inc remen t_reso lut i on and 
f ixed_vop_time_increment in MPEG-4. 


Frame_rate_code 

Vop_timejncrement_resolution 

Fixed_vop_timeJncrement 

0001 

24,000 

1001 

0010 

24 


0011 

25 


0100 

30,000 

1001 

0101 

30 


0110 

50 


0111 

60,000 

1001 

1000 

60 



MV data is stored in the macroblock layer. Up to 
four MVS are possible for each macroblock. Moreover, a 
MV can be of either field or frame type and have either 
full pixel or half pixel resolution. The MPEG-2 MV 
decoding process is employed to determine motion_code 
(VLC) and motion_residual (FLC) and, hence, delta. 
Combined with predictive MV, delta gives the 
field/ frame MV. The MV for skipped macroblocks is set 
to zero. 

DCT data is stored in the block layer. it is 
first decoded from the bitstream (VLC) , inverse scanned 
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using either zigzag or alternate scanning pattern, and 
then inverse quantized. The intra DC coefficient is 
determined from dct_dc_dif f erential and the predictor 
(the predictor is reset according to the MPEG-2 spec) . 
5 DCT coefficients in a skipped macroblock are set to 

zero. 

4.2 Texture coding 

A transcoder in accordance with the invention 
reuses DCT coefficients (for inter frame) - The 
10 following guidelines should be used: 

1. q_scale_type = 1 (linear scale) is used in 
MPEG-2 quantization. 

2. The MPEG quantization method should only be 
used (not H.263) in MPEG-4 quantization to reduce a 

15 mismatch between MPEG-2 and MPEG-4 reconstructed frame 

(drift) . 

3. A differential value of MPEG-2 QP determines 
dquant in MPBG-4- Dquant is set to ±2 whenever the 
differential value is greater than ±2. dquant is a 2- 

20 bit code which specifies a change in the quantizer, 

quant, for I- and P-VOPs. 

4. The quantization matrix should be changed 
following the change of matrix in the MPEG-2 bitstream. 

5. The transcoder has the flexibility of 

25 enabling an alternate vertical scanning method (for 

interlaced sequence) at the VOL level. 

6. Intra AC/DC prediction (which involves 
scaling when the QP of the current block is not the 
same as that of the predicted block) should be turned 
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off at a macroblock level to reduce complexity and 
mismatch in AC quantization. 

7. Higher efficiency can be obtained with the 
use of intra_dc_vlc_thr to select the proper VLC table 
(AC/DC) for coding of intra DC coefficients, e.g., as a 
function of the quantization parameter (except when 
intra_dc_vlc__thr is either 0 or 7 - these thresholds 
will force the use of the intra DC or AC table 
regardless of the QP) . 

8. A skipped macroblock is coded as not_coded 
macroblock (all DCT coefficients are zero) . 

9. Cbpy and cbpc (CBP) are set according to 
code_block_pattern_420 (CBP_420) . Note that there is a 
slight discrepancy between CBP in MPEG-4 and CBP_420 in 
MPEG- 2 for an intra macroblock. Specifically, when 
CBP_420 is set, it indicates that at least one of the 
DCT coefficients in that block is not zero. CBP 
contains similar information except it does not 
corresponds to a DC coefficient in an intra macroblock 
(also depending on intra_dc_vlc_thr) . Hence, it is 
possible that CBP is not zero when CBP_420 is zero in 
an intra macroblock (this case can happen in an I-VOP 
and P-VOP, but not B-VOP) . 

There are three sources of loss in texture coding, 
namely QP coding, DC prediction and nonlinear scaler 
for DC quantization. MPEG-4 uses differential coding 
to code a QP. MPEG-2 allows all possible 3 2 QP values 
at the expense of 5 bits. However, the differential 
value can take up to ±2 (in QP value units) and, hence. 
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a differential value greater than ±2 is loss. This loss 
can be minimized by limiting the QP fluctuation among 
the macroblock in the MPEG-2 rate control algorithm. 
All intra macroblocks perform adaptive DC prediction, 
5 which may take a different prediction from the previous 

macroblock (MPEG-2 DC prediction) thereby causing a 
different DC residual for the quantization. DC 
coefficients of all intra macroblocks in MPEG-4 are 
also quantised in a different manner from MPEG-2 
10 because of the nonlinear scaler. Therefore, quantised 

DC coefficients for MPEG-2 and MPEG-4 coding are likely 
to be different for an intra macroblock. 
4.3 MV coding 

The transcoder encodes MVs into an MPEG-4 format. 
15 However, there is no error involved in transcoding a MV 

from MPEG-2 to MPEG-4 since MV coding is a lossless 
process. The following constraints are imposed on a 
MPEG-4 encoder: 

1. Unrestricted motion compensation mode is 
20 disabled, which means no MV pointing outside the 

boundary of the frame. 

2. Advanced prediction mode is employed. A 
different predictor (a median value) is used in an 
MPEG-4 bitstream, but a MV for 8x8 pels block is not. 

25 That is, advanced prediction mode allows Bx8 MV and 

nonlinear (median filter) predictor. Only a nonlinear 
predictor is used in our format transcoder {we still 
keep a 16x16 MV) . 
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3. Direct mode is not allowed in an MPEG-4 
bitstream, which means there are only four MV types for 
a B-VOP, i.e., 16x16 forward and backward vectors and 
16x8 forward and backward field vectors. 

4. Field motion compensation is applied whenever 
a 16x8 field vector is used (maintain mode) . 

5 . A skipped macroblock is coded as not_coded 
macroblock (motion compensation with zero MV) , 

6. Single f_code is allowed in MPEG-4. 
Therefore, the larger f_code in MPEG- 2 between the two 
directions (vertical, horizontal) is converted to 
f_code in MPEG-4 based on the following relationship: 
f_code (MPEG-4) = f_code(MPEG-2) -1 . 

7. A padding process is not used since the 
texture for the entire reference frame is known. 

8. Field motion compensation is used whenever 
dual prime arithmetic is activated. Vector parities 
(field of the reference and field of the predicting 
frame) are preserved. Field MVs are generated 
according to vector [0] [0] [l:0] which is coded in the 
MPBG-2 bitstream. When prediction of the same parity 
is used (e.g., top field to top field, or bottom field 
to bottom field) , both field MVs are vector [0] [0] [1:0] . 
When prediction of the odd parity is used (e.g., top 
field to bottom field, or bottom field to top field) , 
the top field MV uses vector [2] [0] [1:0] and the bottom 
field MV uses vector [3] [0] [1:0] . Vector [r] [0] [0:1] for 
r=2,3 can computed as follows: 
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(a) Vector [r] [0] [0] = ( vector [0] [0] [0] n 
m[parity_ref ] [parity_pred] //2) + dmvector [0] . 

(b) Vector [r] [0] [1] = (vector [0] [0] [1] x 
m tparity_ref ] [parity_pred] //2) + 

e [parity_ref ] [parity_pred] + dmvector [1] . 

Note that {m[parity_ref ] [parity_pred] and 
e [parity_ref ] [parity_pred] are defined in Table 7-11 
and 7-12, respectively in the MPEG-2 specification 
(ISO/IEC 13818-2) . 

Moreover, "r" denotes the order of the MV, e.g., 
first, second, etc. r=0 denotes to the first set of 
MV, and r=l denotes the second set of MV. Dual prime 
prediction uses r=2 and r=3 to identify two extra sets 
of MVS. 

"//" denotes integer division with rounding to the 
nearest integer, 

4.4 Coding of intra MB in B-VOP 

Additional conversion is necessary when coding an 
intra MB in a B-frame of a MPEG-2 bitstream (e.g., as 
shown in FIG. 4) . MPEG-4 replaces intra mode with 
direct mode for B-VOP and hence an intra MB in B-frame 
has to be coded differently in the MPEG-4 syntax. 
There are two practical solutions to this problem. 

The first solution employs the architecture 
similar to the front-to-back transcoder of FIG. 3 (no 
buffer for the entire reference frame) . MC is 
performed against the previous MB (or previous MB 
without compensating texture residual with the expense 
of the extra memory with the size of one MB) in the 
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same VOP under the assumption that this MB is close 
enough to its reference MB (its uncompensated version) . 
The MV for the intra MB equals the MV of the previous 
MB offset by its MB distance. 

The second solution uses the architecture similar 
to the one shown in FIG. 4. It keeps the reference 
frame for all I and P-VOPs . Note that MC has to be 
performed on all P-VOPs in this solution. The MV for 
the intra MB is the same as the predicted MV (median of 
its three neighbors) and MC is performed against the 
reference MB pointed by the derived MV. 

5 • Video downscaling in the compressed domain 
Generally, video downscaling and size transcoding 
have the same meaning. Downsampling means sub-sampling 
with an anti-aliasing (low pass) filter, but 
subsampling and downsampling are used interchangeably 
herein - 

Size transcoding becomes computationally intensive 
when its input and output are in the compressed domain, 
A video downscaling process which limits its operations 
in the compressed domain (and, in effect, avoids 
decoding and encoding processes) provides a much 
reduced complexity. However, two new problem arises 
with downscaling in the compressed domain, i.e., 
downsampling of DCT coefficients and MV data. 

Recently, video downscaling algorithms in the 
compressed domain have been discussed, but they do not 
address the complete transcoding between MPEG-2 and 


W0 01/«>5<.JJ 


PCT/i:S«l/4(WII 


32 


MPEG-4, which includes field-to-frame deinterlacing . 
The present invention addresses this problem. 

Subsection 5.1 and 5.2 provide solutions to two 
new problems in the downsampling process. The 
5 implementation of a proposed size transcoder in 

accordance with the invention is described in section 6 
and FIGs 5 and 6. 

5.1 Subsampling of DCT block 

In frame-based video downscaling, it is necessary 

10 to merge four 8x8 DCT blocks into a new 8x8 DCT block 
(specific details involving a field block will be 
described later) . Moreover, the output' block should be 
a low pass version of the input blocks. This process 
is carried out in the spatial domain by multiplying the 

15 input matrix with a subsampling matrix (preferably with 
a low pass filter) . Multiplication by a subsampling 
matrix in the spatial domain is equivalent to 
multiplication by DCT coefficients of a subsampling 
matrix in the DCT domain because of the distributive 

20 property of the orthogonal transform. However, the 

number of operations (computations) in the downsampling 
process in the DCT domain for some downsampling filters 
can be as high as the total number of operations of its 
counterpart in the spatial domain. The solution to 

25 this problem is to employ a downsampling matrix which 
is sparse (e.g., a matrix that has relatively few non- 
zero values, e.g., approximately 3 0% or less). 

A sparse downsampling matrix may be based on the 
orthogonal property between the DCT basis vector and 
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the symmetry structure of the DCT basis vector. One 
approach, discussed in R. Dugad and N. Ahuja, "A Fast 
Scheme For Downsampling And Upsampling m The DCT 
Domain, " International Conference on Image Processing 
(ICIP) 99, incorporated herein by reference, takes the 
lower 4x4 DCT coefficients from four processing blocks 
applies a 4x4 IDCT to each DCT subblock, forms a new 
8x8 pixel block and applies an 8x8 DCT to obtain an 
output block. The downsampling matrix can be pre- 
calculated since the downsampling process is fixed. By 
splitting the 8x8 DCT matrix into left and right 
halves, about half of the downsampling matrix values 
are zero because of the orthogonality between the 
column of the 4x4 IDCT matrix and the row of both left 
and right 8x4 DCT matrices. This operation (one 
dimension) can be written mathematically as: 


= Ti7iBi + T,i7iB2 


where b is a 8x1 spatial input vector, b is its 
corresponding 8x1 DCT vector, b. and b, are subsampled 
4X1 vectors, and B. are lower 4x1 DCT vectors T is 
the 8x8 DCT transform matrix, T. is the 4x4 DCT 
transform matrix, T. and T« are left and right half of 
T. The superscript "t" denotes a matrix transpose 
Dugad's algorithm also employs the symmetry property of 
the DCT basis vector to reduce the complexity of the 
downsampling process. TiJ^ and Tj^T^ are identical in 





'liBi' 

B = Tb=T 





liB2 
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terms of magnitude { hTi{iJ)^i-\y^^'T^Ti{U^^^ since 
odd rows of T are anti -symmetric and even rows of T are 
symmetric. "i" is a matrix row index, and "j" is a 

matrix column index. Hence, both and Tj^TI can be 

5 calculated based on the same components, i.e., a 

symmetrical part, E, (index which i+j is even) and an 
anti -symmetrical part, O, (index which i+j is odd) 

( T1T4 =E+o and Tj^Ti =£-0) . This arrangement effectively 
reduces the number of multiplications by a factor of 
10 two when the downsampling process is done as : 

B = TiT^Bi + Tj^TiB2 = (^^ + 0)Bi + (£ - 0)52 = E(B^ + ^2) + 0(By - B2) 

Implementation of Dugad's method to convert four 
field blocks into one frame block is not as simple. An 
15 extension of the downsampling process in this scenario 

(one dimension) can be written as: 

B^nSxTiBr-^SBTiBs) 
where Br and Bb are the lower 4x1 field vectors, St and 
Sb are DCT values of an 8x4 deinterlacing matrix 

20 corresponding to its top, Sr, and bottom, Sb, field 

block, respectively. Elements of Sr, Srfi/Jy'=l if 
(j=2i, 0<i<3) and ST(i,j)~0 otherwise. Elements of Sb, 
SB(i,j)=l if (j=2i+l, 0<i<3; and 3B(±,j)=0 otherwise. 
This is a modification of Dugad's algorithm for 

25 downsampling and deinterlacing in accordance with the 

present invent ion . 
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The operations of downscaling and the 
deinterlacing process are more complex since S and T 
are not orthogonal to each other and, hence, the 
downsampling matrix is not sparse. C. Yim and M.A. 
Isnardi, "An Efficient Method For DCT-Domain Image 
Resizing With Mixed Field/Frame-Mode Macroblocks , " IEEE 
Trans. Circ. and Syst. For Video Technol , , vol. 9. 
pp. 696-700, Aug. 1999, incorporated herein by 
reference, propose an efficient method for downsampling 
a field block. A low pass filter is integrated into 
the deinterlacing matrix in such a way that the 
downsampling matrix (S=0.5[Je Ig]} is sparse. 

la denotes an 8x8 identity matrix, and [Ig Is] 
denotes a 16x8 matrix that comprises a concatenation of 
the two identity matrixes. The identity matrix, of 
course, has all ones on the diagonal and all zeroes 
elsewhere , 

The method starts with four 8x8 IDCT field blocks, 
then applies the downsampling matrix, 5, and performs 
an 8x8 DCT to obtain the output block. Note that an 
8x8 IDCT is used in this method instead of a 4x4 IDCT. 
This operation can be shown mathematically (in one 
dimension) as: 
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5,2 Subsampling of MV data 
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ME is the bottleneck of the entire video encoding 
process. It is hence desirable to estimate a MV of the 
resized MB by using MVs of four original MBs without 
actually performing ME (assuming that all MBs are coded 
5 in inter mode) . Note that, if an MPEG- 2 bitstream is 

assumed, subsampling of MV data takes MVs of four MBs 
since each MB has one input (only an MPEG-4 bitstream 
can have a MV for every block) . The simplest solution 
is to average four MVs together to obtain the new MV 

10 but it gives a poor estimate when those four MVs are 
different. B. Shen, I.K. Sethi and B, Vasudev, 
"Adaptive Motion-Vector Resampling For Compressed Video 
Downscaling, " IEEE Trans. Circ. and Syst . For Video 
Technol., vol. 9, pp. 929-936, Sep. 1999, show that a 

15 better result can be obtained by giving more weight to 
the worst predicted MV. A matching accuracy, A, of 
each MV is indicated by the number of nonzero AC 
coefficients in that MB. By using the Shen et al . 
technique, the new MV for the downscaled MB can be 

20 computed as: 


4 
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M.R. Hashemi, L. Winger and S. Panchanathan, 
"Compressed Domain Motion Vector Resampling For 
Downscaling Of MPEG Video," ICIP 99, propose a 
nonlinear method to estimate the MV of the resized MB. 
Similar to the algorithm in Shen et al., Hashemi's 
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technique uses spatial activity of the processing MBs 
to estimate the new MV. A heuristic measurement, 
called Maximum Average Correlation (MAC) is employed in 
Hashemi's method to identify one of the four original 
MVS to be the output MV. By using the MAC, the new MV 
for the downscaled MB can be computed as:: 


where p is the spatial correlation and is set to 0.85, 
and di is the Euclidean distance between the ith input 
MV {MVi) and the output MV, 

6, Implementation of the size transcoder 
FIG. 5 illustrates a size transcoder in accordance 
with the invention, B frames may be present in the 
input bitstream, but are discarded by the transcoder 
and therefore do not appear in the output bitstream. 

In the transcoder 500, a MV scaling function 510, 
DCT scaling function 520, and spatial scaling function 
540 are added. Switches 530 and 535 are coordinated so 
that, in a first setting, an output of the DCT function 
455 is routed into the quantisation function 340, and 
the switch 535 is closed to enable an output of the 
spatial scaling function 540 to be input to the adder 
445. In a second setting of the switches 530 and 535, 
an output of the DCT scaling function 520 is routed 
into the quantisation function 340, and the switch 535 
is open. 
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A/F = max 



wo 01/95633 


prT/i'S(n/40Hn 


38 


The transcoder 500 converts an MPEG-2 bitstream 
into an MPEG-4 bitstream which corresponds to a smaller 
size video, e.g., from ITU-R 601 (720x480) to SIF 
(352x240) . 

5 To achieve a bandwidth requirement for the MPEG-4 

bitstream, the transcoder 500 subsamples the video by 
two in both the horizontal and vertical directions (at 
the spatial scaling function 540) and skips all B- 
frames (at temporal scaling functions 545 and 546) , 

10 thereby reducing the temporal resolution accordingly. 

Note that the temporal scaling function 546 could 
alternatively be provided after the DCT scaling 
function 520. Skipping of B-frames before performing 
downscaling reduces complexity. 

15 Moreover, a low pass filter (which can be provided 

in the spatial scaling function 540) prior to 
subsampling should result in improves image quality. 

The invention can be extended to include other 
downsampling factors, and B-VOPs, with minor 

20 modifications. Specifically, changes in MV downscaling 

and mode decision are made. MV downscaling for B-VOP 
is a direct extension of what was discussed to include 
the backward MV, The mode decision for B-VOP can be 
handled in a similar way as in the P-VOP (e.g., by 

25 converting uni -directional MV into bi-directional MV as 

in converting intra MB into inter MB in a P-VOP). 

Below, we discuss six problems that are addressed 
by the size transcoder 500. We also assume that the 
input video is 704x480 pixel resolution, and coded with 
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10 


an MP@ML MPEG-2 encoder, and the desired output is 
simple profile MPEG-4 bitstream which contains SIF 
progressive video (with a frame rate reduction by N) . 
However, the invention can be extended to other input 
and output formats and resolutions as well. 

6.1 Progressive Video MV downscaling (luma) 
This problem appears when all four MBs are coded 
as inter, and use frame prediction. Each MV in those 
MBs is downscaled by two in each direction (horizontal 
and vertical) to determine the MV of four blocks in 
MPEG-4 (MPEG-4 allows one MV per 8x8 block) . The 
scaled MVs are then predict ively encoded (using a 
median filter) using the normal MPEG-4 procedure. 

Note that each MB (comprising four blocks) has to 
15 be coded in the same mode in both MPEG-2 and MPEG-4. 

With video downscaling, the output MB (four blocks) 
corresponds to four input MBs, 

6.2 Interlaced Video MV downsampling (luma) 
This problem exists when all four MBs are coded as 
inter and use field prediction. We need to combine two 
field MVS in each MB to get a frame MV of the resized 
block. Instead of setting the new MV based on the 
spatial activity, the proposed transcoder picks the new 
MV based on its neighbors' MVs. The MVs of all eight 
25 surrounding MBs are used to find a predictor (field MVs 

are averaged in case of MB with field prediction) . The 
median value from these eight MVs becomes a predictor, 
and the field MV of the current MB, which is closer in 


20 
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terms of Euclidean distance, is scaled by two in the 
horizontal direction to become the new MV. 

6.3 MV downsampling (chroma) 

This problem happens when all four MBs are coded 
5 as inter, and use either frame or field prediction 

{MPEG-4 treats both prediction mode in the same way for 
a chroma block) , The process follows the MPEG-4 method 
to obtain a chroma MV from a luma MV, i.e., a chroma MV 
is the downscaled version of the average of its four 
10 corresponding, 8x8 luma MVs - 

6.4 DCT downsan^ling (lijma progressive/ chroma) 
This problem occurs when all four luma MBs are 

coded as intra or inter, and use frame MB structure, 
and their eight chroma blocks (four for Cr and four 
15 for Cb) use either frame or field structure). Dugad' s 

method is used to downscale the luma and chroma DCT 
blocks by a factor of two in each direction. 

6.5 Interlaced DCT downsampling (luma) 

This problem arrives in one of two ways. First, 
20 its associated MB uses field prediction and second, its 

associated MB uses frame prediction. In either case, 
we want to downscale four 8x8 field DCT blocks (two for 
the top field, and two for the bottom field) into one 
8x8 frame DCT block. The solution for the first case 
25 is to use the same field DCT block as the one chosen 

for MC. The second case involves deinterlacing and we 
propose a combination of the Dugad and Yim methods, 
discussed above. 
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Specifically, the transcoder first downscales four 
field blocks in the vertical direction (and at the same 
time performs deinterlacing) based on the Yim algorithm 
to obtain two frame blocks. The transcoder then 
downscales these two frame blocks in the horizontal 
direction to get the output block using the Dugad 
algorithm. 

This is illustrated in FIG. 6, where four 8x8 
coefficient field-mode DCT blocks are shown at 600, two 
8x8 frame-mode DCT blocks are shown at 610, and one 8x8 
frame-mode DCT block is shown at 620. 

The procedure for DCT downscaling in accordance 
with the invention can be summarized as follows: 

1. Form the 16x16 coefficient input matrix by 
combining four field blocks together as shown at 600. 

2. For vertical downscaling and filtering, apply a 
low pass (LP) filter D according to Yim's algorithm to 
every row of the input matrix. The LP input matrix is 
now 16x8 pixels, as shown at 610. 

3. Form Bi and Bz 8x8 matrices from the LP matrix 

4. Perform a horizontal downscaling operation 
according to Dugad' s algorithm to every column of Bj 
and B2 to obtain the output matrix (8x8) (620) as 
follows : 

B = B^iTif^y + 52(7>7^( / = (5, + 52)£ + {By - 82)0 

where E and O denote even and odd rows as 
discussed above. 
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In particular, a horizontal down samp ling matrix 
composed of odd "0" and even "E" matrices as follows 
may be used (ignoring the scaling factor) : 
E 


{e(0) 

0 

0 

0, 

0 

e{l) 

0 

e(2) , 

0 

0 

0 

0, 

0 

e(3) 

0 

e(4) , 

0 

0 

e(5) 

0, 

0 

e(6) 

0 

e(7) , 

0 

0 

0 

0; 

0 

e(8) 

0 

e{9)] . 


10 


O = [0 0 0 0, 

0(0) 0 o(l) 0, 

0 o(2) 0 0, 
15 o(3) 0 o(4) 0, 

0 0 0 0, 

o(5) 0 o(6) 0, 

0 0 0 o{7), 

o(8) 0 o(9) 0] . 
The coefficients as follows can be used; 
20 e(0) = 4 o(0) = 2.56915448 

e(l) = 0.831469612 o(l) =-0.149315668 
e(2) = 0.045774654 o(2)= 2 
e{3) = 1.582130167 o(3) =r-0 . 899976223 
e(4) =-0.195090322 o(4) = 1.026559934 
25 e(5) = 2 o(5) = 0.601344887 

e(6) =-0.704885901 o(6) = 1.536355513 
e(7) = 0.980785280 o(7) 2 
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e(8) = 0.906127446 0(8) =-0.509795579 
e{9) = 1.731445835 o{9) =-0.750660555. 

Essentially, the product of a DCT matrix which is 
sparse is used as the downsampling matrix. 
5. The technique may be extended generally for 2:1 

downsizing of an NxN block that comprises four N/2xN/2 
coefficient field-mode blocks. Other downsizing ratios 
may also be accommodated. 
6.6 Special cases 

10 Special cases occur when all four MBs are not 

coded in the same mode (not falling in any of the five 
previous cases) . We always assume that any intra or 
skipped MB among the other inter MBs are inter mode 
with zero MV. Field MVs are merged based on section 

15 6.2 to obtain frame MV, and then we apply the 

techniques, of section 6.1. MC is recommended to 
determine the texture of the intra block, which is 
treated as an inter block with a zero MV by the 
transcoder . 

20 7. Conclusion 

It should now be appreciated that the present 
invention provides a transcoder architecture that 
provides the lowest possible complexity with a small 
error. This error is generated in the MPEG- 4 texture 

25 encoding process (QP coding, DC prediction, nonlinear 

DC scaler) . These processes should be removed in the 
future profile of MPEG-4 to create a near-lossless 
transcoding system. 
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The invention also provides complete details of a 
size transcoder to convert a bitstream of ITU-R 601 
interlaced video coding with MPEG- 2 MP@ ML into a 
simple profile MPEG-4 bitstream which contains SIF 
progressive video suitable for a streaming video 
application. 

For spatial downscaling of field-mode DCT blocks ; 
it is proposed to combine vertical and horizontal 
downscaling techniques in a novel manner such that 
sparse downsampling matrixes are used in both the 
vertical and horizontal direction, thereby reducing 
computations of the transcoder. 

Moreover, for MV downscaling, we propose using a 
median value from its eight neighboring MV. This 
proposal works better than algorithms in section 5.2 
since our predicted MV go with the global MV. It also 
works well with an interlaced MB, which has only two 
MVS instead of 4 MVs per MB. 

Although the invention has been described in 
connection with various preferred embodiments, it 
should be appreciated that various modifications and 
adaptations may be made thereto without departing from 
the scope of the invention as set forth in the claims. 
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What is claimed is: 

1. A method for transcoding a pre -compressed 
input bitstream that is provided in a first video 
coding format, comprising the steps of: 

recovering header information of the input 
bitstream; 

providing corresponding header information in a 
second, different video coding format; 

partially decompressing the input bitstream to 
provide partially decompressed data; and 

re -compressing the partially decompressed data in 
accordance with the header information in the second 
format to provide an output bitstream. 

2. The method of claim 1, wherein: 

the first and second video coding formats comprise 
an MPEG-2 format and an MPEG-4 format, respectively. 

3. The method of claim 1, wherein: 

the first video coding format comprises MPEG-2 
Main Profile at Main Level; and 

the second video coding format comprises a simple 
profile MPEG-4 bitstream with Standard Intermediate 
Format (SIF) progressive video. 

4. The method of claim 1, wherein: 
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the partially decompressed data comprises motion 
vectors and Discrete Cosine Transform (DCT) 
coefficients; and 

the second format comprises at least one of a new 
mode decision, AC/DC prediction, and motion 
compensation , 

5. The method of claim 1, wherein: 

at least one look-up table is used to provide the 
corresponding header information in the second video 
coding format . 

6. The method of claim 1, wherein; 
downscaling is performed on the partially 

decompressed data by downsampling DCT coefficients and 
motion vector data thereof. 

7. The method of claim 1, wherein: 

2:1 downscaling is performed on at least one group 
of four field-mode Discrete Cosine Transform (DCT) 
blocks of the partially decompressed data by performing 
vertical downsampling and de- interlacing thereto to 
obtain a corresponding group of two frame -mode DCT 
blocks, and performing horizontal downsampling to the 
two frame-mode DCT blocks to obtain one frame-mode DCT 
block. 

8. The method of claim 7, wherein: 
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the vertical downsampling also achieves low pass 
filtering of the four field-mode DCT blocks. 

9. The method of claim 7, wherein: 

the vertical and horizontal downsampling use 
respective sparse matrixes, 

10. The method of claim 1, wherein: 

in the recompressing step, a code (DQUANT) which 
specifies a change in a quantizer is set according to a 
differential value of a quantization parameter of the 
partially decompressed data. 

11. The method of claim 1, wherein: 

for. re-compressing intra coded macroblocks, a 
coded block pattern (CBP) is set according to a 
corresponding value of the partially decompressed data. 

12. The method of claim 1, wherein: 

for re -compressing non- intra coded macroblocks, 
skipped macroblocks in the partially decompressed data 
are coded as not_coded macroblocks, where all Discrete 
Cosine Transform (DCT) coefficients have a zero value. 

13. The method of claim 1, wherein: 

in the recompressing step, predicted motion 
vectors in the partially decompressed data are reset 
according to the second format. 
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14. The method of claim 1; wherein: 

in the recompressing step, dual prime mode 
macroblocks of the partially decompressed data are 
converted into field-coded macroblocks. 

15. A method for performing 2:1 downscaling on 
video data, comprising the steps of: 

forming at least one input matrix of NxN Discrete 
Cosine Transform (DCT) coefficients from the video dat; 
by combining four N/2xN/2 field-mode DCT blocks; 

performing vertical downsampling and de- 
interlacing to the input matrix to obtain two N/2xN/2 
frame -mode DCT blocks; 

forming an NxN/2 input matrix from the two frame- 
mode DCT blocks ; and 

performing horizontal downsampling to the NxN/2 
matrix to obtain one N/2xN/2 frame-mode DCT block. 

16. The method of claim 15, wherein N=16. 

17. The method of claim 15, wherein: 

the vertical downsampling also achieves low pass 
filtering of the NxN input matrix. 

18. The method of claim 15, wherein: 
the vertical downsampling uses a sparse 

downsampling matrix. 

19. The method of claim 18, wherein: 
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the sparse dovmsampling matrix=0 , 5 [Iq Ig] , where Ig 
is an 8x8 identity matrix. 

20. The method of claim 15/ wherein: 

the horizontal downsampling uses a sparse 
downsampling matrix composed of odd "O" and even "E" 
matrices . 

21. The method of claim 20, wherein: 
the even matrix has the following form: 
E = [e(0) GOO, 

0 e(l) 0 e(2), 
0 0 0 0, 
0 e(3) 0 e(4), 
0 0 e(5) 0, 
0 e(6) 0 e(7), 
0 0 0 0, 
0 e(8) 0 e(9)] 
where e(l) through e(9) are non-zero coef f icients ; 


and 


the odd matrix has the following form: 

O = [0 0 0 0, 

o(0) 0 o(l) 0, 

0 o(2) 0 0, 

o(3) 0 o(4) 0, 

0 0 0 0, 

o(5) 0 o(6) 0, 

0 0 0 o(7), 

o(8) 0 o(9) 0] 
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where o(l) through o(9) are non-zero coefficients. 

22. An apparatus for transcoding a pre -compressed 
input bitstream that is provided in a first video 
coding format, comprising: 

means for recovering header information of the 
input bitstream; 

means for providing corresponding header 
information in a second, different video coding format; 

means for partially decompressing the input 
bitstream to provide partially decompressed data; and 

means for re -compressing the partially 
decompressed data in accordance with the header 
information in the second format to provide an output 
bitstream. 

23. An apparatus for performing 2;1 downscaling 
on video data, comprising: 

means for forming at least one input matrix of NxN 
Discrete Cosine Transform (DCT) coefficients from the 
video data by combining four N/2xN/2 field-mode DCT 
blocks; 

means for performing vertical downsampling and de- 
interlacing to the input matrix to obtain two N/2xN/2 
frame-mode DCT blocks; 

means for forming an NxN/2 input matrix from the 
two frame-mode DCT blocks; and 
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means for performing horizontal downsampling to 
the NxN/2 matrix to obtain one N/2xN/2 frame-mode DCT 
block. 
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a corrcspondmg header m the new format for ihe output bitsircam. In one embodiment (Fig. 3). a low complexity front-to-hack 
trani^odcr (with B Jrames disabled) avoids the need for moiitm compensation processing. In am»iher cmbodimcni iMg 4) u 
iranscoder architecture that minimizes drift error (with B frames enabled) is pmvided. In another embodimeni <Fig 5k a si/e 
mmscoder (with B trames enabled) is provided, e.g.. to converi a biistrcam of ITO-R 601 interlaced video coding with MPEGO 
MP<n ML into a simple profile MPEG.4 biistream which contains SIE progressive video suitable for a strxraming video application, 
hor spatial downscaling of field-mode DCT blocks, vertical and horizontal downscalling tehniques are combined to use sparse ma- 
inxes to reduce computations. 


BNSDOCID, <W0 01£I5633A3_I 


wo 01/095633 A3 llllllilliiiiiiiilliiliiiiliillilllllii 


(NH| Duie of |>ublicatioii of the iiiteniationiil search report: 

2: AuL'ust :(M): 


tMo-L'tlcr coJcs uti J other tthhrcvuit ions. ivn't'U* llu- "i.ttUii- 
iithv ttn CoJcs tinJ Ahhrcrtti/ions" jf^f'Ciintji^ w/ the hi'i^in- 
ntn}Z of cacit tv\:uLir ussuc of ihc PCl Uuzciii: 


BNSDOCID <WO , 0195633A3 ( > 


INTERNATIONAL SEARCH REPORT 


A. CLASSIFICATION OF SUBJECT AflATTEFI 

IPC 7 H04N7/26 


Ic^Boafional ApplicAtion Mo 


PCT/US 01/40811 


According lo Inlemalional Palcnl Classilicalton (IPC) or (o both national classification and IPC 


B. RELDS SEARCHED 


Mpintirn aocumantatwii searched (classiticalion system lotlowed by classiJicaiion symbols) 


Documentation searched oiher than minimum documentation to tl»e extent that such docurrenls are included in the iickfc searched 


Eleclfonlc data base consiJtieil duhnq Ihe intemationat search (name of data base and. where practical, search terms used) 

EPO-Internal, INSPEC, COMPENOEX, PAJ 


C. OOCUMEMTS CONSIDERED TO BE RELEVANT 


CalegofV ' Cilairon of documenl. wiTh in<*calion. where appmpnale. of rhe relevant passages 


Relevant to iJaim No. 


DOGAM S ET AL: "EFFICIENT MPEG-4/H/263 
VIDEO TRANSCODER FOR INTEROPERABILITY OF 
HETEROGENEOUS MULTIMEDIA NETWORKS" 
ELECTROMICS LETTERS, lEE STEVENAGE, 6B, 
vol. 35, no. 11, 27 May 1999 (1999-05-27), 
pages 863-864, XP000908120 
ISSN: 0013-5194 
paragraph 'Introduction! 
figure 1 

paragraph 'Methodology of transcoding! 


-/- 


1,22 


3-5, 
10-14 


X] ^■**t<Br documanis are ISkac in IM eofltinuadon of liox C. 


ID 


Palerrt tamlty merrtiars are listed In annejt. 


Specal categories of cited documents : 

'A' documenl deftntng the general stale of the an which is not 
considered to be of particular retevance 

E* earlier document but published on or after the international 
flung dale 

V documenl which may throw doubts on phottty dakms) or 
which is eked lo estabiieh the pubHcabon dale of another 
dtation or other special reason (as specifted) 
O' document refening to an oral dtecJosure. use. eKhibltlon or 
oTher means 

P* document piMished prior to the inlcmalkinal fifriq date but 
laler than the proritydale daimed ^ 


Date of lite actual completion of the Intemailohal search 


29 April 2002 


•T* later document ptdilshed after ihe hrtematkmal tiling cble 
or priofTty cbte and not in cormn with Ihe appUcation bul 
crted toimdeisland the principle or theory under^ng (he 
invention 

■X* document of particular relevance, the claimed invention 
cannot be considered novel or cannot be considered to 
invohre an inventive step when the document is taken atone 

'V documenl ol particular retevanoe; the claimed invention 
annor be consldeisd to invoh^ en inventive step when the 
document ts combined wOh one or more other such docu- 
menJs. such combination being obvbus to a person skilled 
m Ihe an. 

'4' document member of ihe same patent lanily 


Name and mailing address of the ISA 

European Palani Office, P.B. 581B Palentlaan 2 
NL-22B0HV Rltswqk 
TeL (+31-70) 340-2040. Tx. 31 651 epo nl 
Fax; (+31-70) 340-M16 


Fonn PCT/1SA«10 (ssoond shMtj lJuly 1992) 


Date of mallmgor the imemailonai search report 


1 Z Oa 2002 


Authorized officer 


Georglou, G 


BMSDOCiD <WG_01S5633A3_I_ 


INTERNATIONAL SEARCH REPORT 


traditional ApplicaUon No 

PCT/US 01/40811 


C.(Continuation) DOCUMENTS CONSIDERED TO BE RELEVANT 


Category • 

C4alion of document, with indicalion.whera appfOpnatd, o( lha relovani passages 

Reievanl lo cL^im Ni> 

Y 

WEE S J ET AL: "FIELD-TO-FRAME 
TRANSCODING WITH SPATIAL AND TEMPORAL 
DOWNSAMPLING" 

PROCEEDINGS 1999 INTERNATIONAL CONFERENCE 
f\U TMARP PRnrP"\<;TNR TPTP'QQ KOBE JAPAN 
OCT. 24 - 28, 1999, INTERNATIONAL 
CONFERENCE OM IMAGE PROCESSING, LOS 
ALAMITOS, CA: IEEE, US, 
vol . 4 OF 4, 

24 October 1999 (1999-10-24), pages 
271-275. XP000895525 
ISBN: 0-7803-5468-0 
paragraph '05.1!; figure 4 

2 

E 

OATrfclT ADCTDAPTC CiC 1ADAM 

rATcNT ABSlKALfb Ur JArAW 

vol . 2000, no. 24, 

11 May 2001 (2001-05-11) 

& JP 2001 204026 A (SONY CORP), 

27 July 2001 (2001-07-27) 

abstract 

& US 2001/010706 Al (TAKAHASHI KUNIAKI ET 
AL) 2 August 2001 (2001-08-02) 
abstract; figure 11 

1-5, 

10-14,22 

A 

WEE S J ET AL: "EFFICIENT PROCESSING OF 
COMPRESSED VIDEO" 

LUNrbKbnCt KLUUKU Ur i nt ocnu AoiLUrlAK 

CONFERENCE ON SIGNALS, SYSTEMS & 
COMPUTERS. PACIFIC GROVE, CA, NOV. 1-4, 
1998, ASILOMAR CONFERENCE ON SIGNALS, 

CVCTCMC Aftin rAMPIITCDC UCU VADfc' MV . TCCC 

US. 

vol. 1, 1998, pages 855-859, XP001032864 
ISBN: 0-7803-5149-5 
paragraph '0003!; figure 1 

1-5, 

10-14,22 

A 

HO 97 39584 A (IMEDIA CORP) 
23 October 1997 (1997-10-23) 
page 7, line 1 -page 10. line 15 

1-5, 

10-14,22 

A 

HORRISON D G ET AL: "REDUCTION OF 
BIT-RATE OF COMPRESSED VIDEO WHILE IN ITS 
CODED FORM" 

INTERNATIONAL WORKSHOP ON PACKET VIDEO. 
XX. XX. 

1 September 1994 (1994-09-01), pages 
D171-D174, XP002075303 
the whole document 

1-5. 

10-14.22 

P,A 

EP 1 032 214 A (MATSUSHITA ELECTRIC IND CO 
LTD) 30 August 2000 (2000-08-30) 
page 6, line 34 -page 7, line 19 

-/-- 

1-5, 

10-14,22 


Four PCT/lSA/2tO(canlinuQliOn ol second s*we»l Muly 1992) 


BNSDOCrO <WO„ 0195633A3 I 


t 


INTERNATIONAL SEARCH REPORT 


l^^ational Apptication No 

PCT/US 01/40811 


C.(Conliniiation) DOCUMEWTS CONSIDERED TO BE RELEV AMT 


Calegory ' ( CUaiion ol documenl. with mdicallon.wnefe apprapriale. 0( Ihe fBlevanI passages 


SONG J ET AL: "FAST EXTRACTION OF 

SPATIALLY REDUCED IMAGE SEQUENCES FROM 

MPEG-2 COMPRESSED VIDEO" 

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS 

FOR VIDEO TECHNOLOGY, IEEE INC. NEW YORK, 

US, 

vol. 9, no. 7. October 1999 (1999-10), 
pages 1100-1114, XP000853341 
ISSN: 1051-8215 
paragraph 'OOOV! 

US 5 926 573 A (NAIMPALLY SAI PRASAD V ET 
AL) 20 July 1999 (1999-07-20) 
column 14, line 45 -column 16, line 56; 
figures 6.7A.7B 

EP 0 794 674 A (HEWLETT PACKARD CO) 
10 September 1997 (1997-09-10) 
page 6, line 35 -page 16, line 30 

DUGAD R ET AL: "A fast scheme for 
downsanfljllng and upsampling in the DCT 
domain" 

IMAGE PROCESSING, 1999. ICIP 99. 
PROCEEDINGS. 1999 INTERNATIONAL CONFERENCE 
ON KOBE, JAPAN 24-28 OCT. 1999, 
PISCATAWAY, NJ, USA, IEEE, US, 
24 October 1999 (1999-10-24), pages 
909-913, XP010369046 
ISBN: 0-7803-5467-2 
cited In the application 
the whole document 

YIM C ET AL: "AN EFFICIENT METHOD FOR 

DCT-DOMAIN IMAGE RESIZING WITH MIXED 

FIELD/FRAHE-MOOE MACROBLOCKS" 

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS 

FOR VIDEO TECHNOLOGY, IEEE INC. NEW YORK, 

US, 

vol. 9. no. 5. August 1999 (1999-08). 

pages 696-700. XP000848395 

ISSN: 1051-8215 

cited in the application 

the whole document 

-/-- 


15-20,23 


6-9.21 
6-9, 

15-21,23 


6-9, 

15-21.23 


6-9. 

15-21,23 


6-9, 

15-21,23 


Form PCTnSA/ai 0 (conttnualton ci Sftoono sh«ei) (July 1992) 


BNSOOCIO <WO 0195633A3J.> 


INTERNATIONAL SEARCH REPORT 


l^j^attonal Application No 

PCT/US 01/40811 

CfConimuatlon) DOCUMENTS CONSIDERED TO BE RELEVANT 

Caiegory • 

Ciialiun ul Oocument. wtth indication, wfiere approprkale, o( tne relevant passages 

Rolavam lo claim No. 

A 

SHEN B ET AL: "ADAPTIVE HOTION-VECTOR 
RESAMPLING FOR COMPRESSED VIDEO 
DOUNSCALING" 

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS 
FOR VIDEO TECHNOLOGY, IEEE INC. NEW YORK, 
US. 

vol. 9, no. 6, September 1999 (1999-09). 

pages 929-936, XP000848857 

ISSN: 1051-8215 

dted In the application 

the whole document 

6-9. 

15-21.23 


BNSCXDCID <WQ 01&5633A3J_> 


INTERNATIONAL SEARCH REPORT 


llernauonal application Nc . 
PCT/US Ql/AOm 


Box I Observations where certain claims were tound unsearchat>le (C<mtinuation of item 1 of first sheet) 


This internatio-ial Search Report has rK>l been esiabli5hed in respect ol certain claims undei Article i7(2Kai for Uie loilowtng reasons: 
r ClairtiE Nos.: 

— because they relate to subject matter not required to be searched by this Authority, namely: 


2. Claims Nos.: ... . . u 

— because they relate lo parts of the ;memailonal Application thai do not comply with the p-escrit»d requirements to such 
an e>tant that no meaningtui International Search can be can-iedout, specltically: 


^ bSause ttfey are dependent cams and are not ctafted in accordance with the second and third sentences of Rule 6.4(a). 


Box tl Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 


This imernaticna! Searching Authority found multtpte inventiona in tWs intemabonal application, as fofJowa: 


see additional sheet 

As a result of the prior review under R. 40.2(e) PCI, 
no additional fees are to be refunded. 

1 . r~~| As an required addmonal search lees were bmety paid by the applicant, xtis International Search Report covers all 
La-J searchatJle dainns. 

2, I I As all seaichable claims could be searched without effort justifying an additional fee, tNs Authortty did not Invite oaymeni 

ofary additional fee. 


3. I I As oily some of the required adcitional search feee were timely paid by the applicant, this International Search Report 
I — * covers only those claims for wnlch tees were paid, spedncaBy claims Nos.: 


4, IJ No required additional search fees were timely paid by the appiicanL Consequently, this International Search Report is 
— restncted to the tnvenbon nrst mentioned in the claims: It is covered by dalins Nos>: 


Remartt on Protest 


I X I The additional search fees were accompanied by the appUcarvfa protest 
j j No protest accompanied the payma'^t of additional search fees. 


Form PCT/ISA'210 (continuation of Brst sheet (1 )) (July 1998) 


BNSDOCID <WO C 195633 A3 J . > 


Intsmational Application No. PCTAJS 01 /I0811 


FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 


This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-5,10-14,22 

Format and bit rate transcoder between two video formats by 
partially decompressing the bitstream. 

1.1. Claims: 1-5,22 

Format transcoder between MPE6-2 and MPEG-4 by 
recovering and transforming the header and partially 
decompressing the bitstream. The header transformation 
is performed using a look up table. The partially 
decompressed data include motion vectors and DOT 
coefficients. 


1.2. Claims: 10-14 

Bit rate transcoder with partial decoding. 


2. Claims: 6*9, 15-21, 23 

A method for downscaling video data In the DCT domain. 


Please note that all inventions mentioned under item 1, although not 
necessarily linked by a common Inventive concept, could be searched 
without effort justifying an additional fee. 


INTERNATIONAL SEARCH REPORT 



^j^nattonal Appticauon No 

PCT/US 01/40811 

Paient documenl 
6ted in search report 

Publication 
date 

Patent ramiiy 
member(st 

Publication 
date 


JP 2001204026 A 27-07-2001 US 2001010706 Al 02-08-2001 



A 

n 

23-10-1997 

£>0 X.^ ^ 1 

AU 

2453897 A 

07-11-1997 




CA 

2249606 Al 

23-10-1997 




EP 

0893027 Al 

27-01-1999 




ilP 
(Ji 


1 1-07-2000 




wn 

wu 











EP 1032214 

A 



30-08-2000 

CN 

1264988 A 

30-08-2000 




EP 

1032214 A2 

30-08-2000 




JP 

2000312363 A 

07-11-2000 

US 5926573 

A 

20-07-1999 

US 

5737019 A 

07-04-1998 




CN 

1182331 A 

20-05-1998 




EP 

0786902 Al 

30-07-1997 




JP 

9233316 A 

05-09-1997 

EP 0794674 

A 

10-09-1997 

US 

5708732 A 

13-01-1998 




EP 

0794674 A2 

10-09-1997 




EP 

0798927 A2 

01-10-1997 




JP 

9331532 A 

22-12-1997 


